Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UnwrappingReuseStrategy for AnalyzerWrapper #14154

Merged

Conversation

mayya-sharipova
Copy link
Contributor

Add a new reuse strategy WrappingReuseStrategy that consults
he wrapped analyzer's strategy to decide if components can be
reused or need to be updated.

Add an optional constructor to CompletionAnalayzer that uses this strategy.

Add a new reuse strategy WrappingReuseStrategy that consults
the wrapped analyzer's strategy to decide if components can be
reused or need to be updated.

Add an optional constructor to CompletionAnalayzer that uses this
strategy.
Copy link

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API is a little concerning. I do not immediately know a better way to handle both wrappedComponents and also the inner storedValue :(

Comment on lines 124 to 136
public CompletionAnalyzer(
Analyzer analyzer,
boolean preserveSep,
boolean preservePositionIncrements,
ReuseStrategy fallbackStrategy) {
super(new WrappingReuseStrategy(fallbackStrategy));
// häckidy-hick-hack, because we cannot call super() with a reference to "this":
((WrappingReuseStrategy) getReuseStrategy()).setUp(this, analyzer, wrappedComponents);
this.analyzer = analyzer;
this.preserveSep = preserveSep;
this.preservePositionIncrements = preservePositionIncrements;
this.maxGraphExpansions = ConcatenateGraphFilter.DEFAULT_MAX_GRAPH_EXPANSIONS;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   /**
   * JAVA DOCS about when you should use this and what it does
   */
   public static CompletionAnalyzer withWrappingReuseStrategy(
      Analyzer analyzer,
      boolean preserveSep,
      boolean preservePositionIncrements,
      ReuseStrategy fallbackStrategy) {
    CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer, preserveSep, preservePositionIncrements, fallbackStrategy);
    ((WrappingReuseStrategy)completionAnalyzer.getReuseStrategy()).setUp(completionAnalyzer, analyzer, completionAnalyzer.wrappedComponents);
    return completionAnalyzer;
  }

What if we made the ctor private, removed the setup hack and did it instead within a public static method?

Comment on lines 172 to 179
public void setUp(
AnalyzerWrapper wrapper,
Analyzer wrappedAnalyzer,
CloseableThreadLocal<TokenStreamComponents> wrappedComponents) {
this.wrapper = wrapper;
this.wrappedAnalyzer = wrappedAnalyzer;
this.wrappedComponents = wrappedComponents;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bugs me. We allow a reuse strategy to be mutated after construction. It seems ready to cause a bug.


@Override
public TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName) {
if (analyzer == wrapper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to know the instance, or can we do Analyzer analyzer instanceof WrapperAnalyzer wrapperAnalyzer then get the strategy from wrapperAnalyzer.getWrappedAnalyzer(fieldName).getReuseStategy().getReusableComponents...?

Comment on lines 196 to 205
public void setReusableComponents(
Analyzer analyzer, String fieldName, TokenStreamComponents components) {
if (analyzer == wrapper) {
setStoredValue(analyzer, components);
wrappedAnalyzer
.getReuseStrategy()
.setReusableComponents(wrappedAnalyzer, fieldName, wrappedComponents.get());
} else {
fallbackStrategy.setReusableComponents(analyzer, fieldName, components);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to know the instance, or can we do Analyzer analyzer instanceof WrapperAnalyzer wrapperAnalyzer then get the wrapped analyzer via getWrappedAnalyzer(fieldName)?

* be re-created. During components creation, this analyzer must store the wrapped analyzer's
* components in {@code wrappedComponents} local thread variable.
*/
public static final class WrappingReuseStrategy extends ReuseStrategy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the better name is UnWrapping as if we find that the analyzer is a wrapped analyzer, we unwrap it and use the underlying analyzer?

Comment on lines 172 to 179
public void setUp(
AnalyzerWrapper wrapper,
Analyzer wrappedAnalyzer,
CloseableThreadLocal<TokenStreamComponents> wrappedComponents) {
this.wrapper = wrapper;
this.wrappedAnalyzer = wrappedAnalyzer;
this.wrappedComponents = wrappedComponents;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that the "wrappedComponents" is key here. its tricky, as the analyzer provides access directly to storedValue and there is no way to override :(.

That seems like a bad API already, but we don't want to change all these long lived interfaces unnecessarily.

@jpountz
Copy link
Contributor

jpountz commented Jan 22, 2025

I don't like that CompletionAnalyzer needs to track a thread-local, the point of reuse strategy is to avoid this kind of thing. Also I'm not sure I understand why CompletionAnalyzer is different from other analyzer wrappers?

@mayya-sharipova
Copy link
Contributor Author

@benwtrent Thanks for the review, I am not happy with the design either, will see how I can incorporate your feedback.

I don't like that CompletionAnalyzer needs to track a thread-local, the point of reuse strategy is to avoid this kind of thing. Also I'm not sure I understand why CompletionAnalyzer is different from other analyzer wrappers?

@jpountz Thanks for looking. The idea of having an extra thread-local is for AnalyzerWrapper to keep track of wrappedComponents, but may be it doesn't need to be thread-local and can be just a local variable.
You are right, CompletionAnalyzer is not different from other analyzer wrappers, and I can extend this policy to all AnalyzerWrapper. CompletionAnalyzer is where we experienced a problem in Elasticsearch, where it did not respect a reuse policy of the wrapped analyzer.

@mayya-sharipova mayya-sharipova changed the title Add WrappingReuseStrategy for AnalyzerWrapper Add UnwrappingReuseStrategy for AnalyzerWrapper Jan 23, 2025
@mayya-sharipova
Copy link
Contributor Author

@jpountz @benwtrent I've addressed your comments in the last commit, please continue to review

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry @mayya-sharipova I am having a difficult time understanding the change and what its supposed to be doing.

Is the idea that we don't reuse wrapped components if the unwrapped component themselves cannot be reused?

@mayya-sharipova
Copy link
Contributor Author

I syncrhonized with @benwtrent offline, and we decided to use to extend TokenStreamComponents, so that it has access to wrapped components.

@benwtrent please review the last changes. Thank you.

@benwtrent
Copy link
Member

I will have to review later, but first glance is way cleaner than before!

Thank you

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments, but its looking good.

Comment on lines 189 to 190
if (analyzer instanceof AnalyzerWrapper wrapper
&& components instanceof TokenStreamComponentsWrapper wrapperComponents) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add an assertion where if its passed an AnalyzerWrapper that the components are indeed instanceof TokenStreamComponentsWrapper?

* A {@link Analyzer.TokenStreamComponents} that decorates the wrapper with access to the wrapped
* components.
*/
public static final class TokenStreamComponentsWrapper extends TokenStreamComponents {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? Could it maybe be package private?

Comment on lines 219 to 222
@Override
protected void setReader(final Reader reader) {
wrapper.setReader(reader);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, do we need to override this? Isn't the default behavior simply setting the source which we already correctly provide via wrapper.getSource()?

@@ -387,7 +387,7 @@ public TokenStreamComponents(final Tokenizer tokenizer) {
*
* @param reader a reader to reset the source component
*/
private void setReader(final Reader reader) {
void setReader(final Reader reader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment about if we actually need to override this.

@benwtrent benwtrent added this to the 10.2.0 milestone Feb 3, 2025
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks so much better! Thank you for iterating on the API. I like it.

@mayya-sharipova mayya-sharipova merged commit ce2a917 into apache:main Feb 5, 2025
6 checks passed
@mayya-sharipova mayya-sharipova deleted the wrapping-reuse-strategy branch February 5, 2025 19:00
mayya-sharipova added a commit that referenced this pull request Feb 5, 2025
Add a new reuse strategy WrappingReuseStrategy that consults
the wrapped analyzer's strategy to decide if components can be
reused or need to be updated.

Co-authored-by: Benjamin Trent <[email protected]>
@jpountz
Copy link
Contributor

jpountz commented Feb 10, 2025

I don't feel good about this change (at least not yet). It looks like there is an analyzer that changes its components over time somewhere, and this change aims at making sure that an analyzer that wrap this analyzer would also update its components. I worry that this is just one place where the assumption that analysis components are immutable is violated, and that there are other ones that haven't been found yet. I'm curious if other alternatives were considered, e.g. atomically swapping the analyzer or updating the behavior of analysis components without needing to replace them.

For reference, a side-effect of this change is that analysis components are now cached both on the wrapper and on the wrapped analyzer, while they were only cached on the wrapper before.

@mayya-sharipova
Copy link
Contributor Author

@jpountz Thank you for your feedback. I am happy to discuss alternatives.

Isn't the whole idea of ReuseStrategy to decide if an analyzer's components can be reused or should be recreated/swapped?

I worry that this is just one place where the assumption that analysis components are immutable is violated, and that there are other ones that haven't been found yet. I'm curious if other alternatives were considered, e.g. atomically swapping the analyzer or updating the behavior of analysis components without needing to replace them.

Yes, indeed in this PR analysis components are immutable, they are recreated/ swapped in an analyzer


For the context: this PR came from a need to have an updateable synonyms analyzer when it is wrapped in another analyzer, for example CompletionAnalyzer. In Elasticsearch an analyzer with updateable synonyms has a custom ReuseStrategy that creates a new components when synonyms are updated. We now want CompletionAnalyzer also update its components when its wrapped synonyms analyzer updated synonyms.

@jpountz
Copy link
Contributor

jpountz commented Feb 11, 2025

I suspect that ReuseStrategy hasn't been built with the idea that it could be use to invalidate components, but only as a way to describe an efficient caching strategy for analysis components, typically whether analysis components can be cached globally (GLOBAL_REUSE_STRATEGY) or need a different cache entry on each field (PER_FIELD_REUSE_STRATEGY).

@uschindler I believe that you are the one who designed this logic, I'd be interested in your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants